Skip to content

Conversation

@dstengle
Copy link
Owner

Summary

This PR significantly improves Docker runtime configuration and removes the problematic srsly dependency that was causing compilation issues with cloud native buildpacks.

Problem Solved

The original issue was that srsly (a spaCy dependency) contains C extensions that require compilation, causing failures in cloud native buildpack environments. Additionally, the CLI was hardcoded to specific paths making Docker usage difficult.

Key Changes

🔧 Dependency Management

  • Removed spacy dependency - Eliminates srsly compilation issues completely
  • Moved SPARQLWrapper from test to main dependencies (required at runtime)
  • Updated poetry.lock with clean dependency tree

🐳 Docker Runtime Configuration

  • Environment variable support for all paths:
    • KBP_WORK_DIR - Working directory
    • KBP_HOME - Configuration directory (replaces hardcoded ~/.kbp)
    • KBP_CONFIG_PATH - Custom config file location
    • KBP_KNOWLEDGE_BASE_PATH - Documents directory
    • KBP_METADATA_STORE_PATH - Metadata storage location

🛠️ New Docker Tooling

  • scripts/docker-run.sh - User-friendly wrapper script with host networking
  • docker-compose.app.yml - Production-ready compose configuration
  • README-docker.md - Comprehensive Docker usage guide
  • Multi-stage Dockerfile - Optimized for both compilation and runtime

📁 Path Flexibility

  • Configurable .kbp location - No longer hardcoded to specific paths
  • Volume mount support - Works with any directory structure
  • Config file detection - Multiple fallback locations with env var priority

Testing

```bash

Build and test Docker image

docker build -t knowledgebase-processor:latest .

Test with environment variables

docker run --rm -v "$(pwd):/workspace" \
-e KBP_WORK_DIR=/workspace \
-e KBP_HOME=/workspace/.kbp \
-w /workspace \
knowledgebase-processor:latest kb --help

Test wrapper script

./scripts/docker-run.sh init
./scripts/docker-run.sh scan
```

Usage Examples

Quick Start

```bash

Initialize and scan documents

./scripts/docker-run.sh init
./scripts/docker-run.sh scan

With custom directory

./scripts/docker-run.sh -w ~/Documents init

Continuous monitoring with host network access

./scripts/docker-run.sh publish --watch
```

Docker Compose

```bash

Interactive mode

docker-compose -f docker-compose.app.yml up kbp

Watch mode with Fuseki

docker-compose -f docker-compose.app.yml up fuseki kbp-watch
```

Benefits

No more compilation issues - Cloud native buildpacks now work
Flexible Docker deployment - Works with any directory structure
Host network connectivity - Can access local SPARQL endpoints
Configurable paths - Adapts to different runtime environments
Production ready - Complete tooling and documentation

Breaking Changes

  • spacy removed - EntityRecognizer was disabled by default anyway
  • Config resolution order - Environment variables now take priority

Migration Guide

For users currently using spacy/entity recognition:

  1. This feature was disabled by default (`analyze_entities: false`)
  2. If needed, can be re-enabled by installing spacy separately
  3. Most users won't be affected as this was an optional feature

For Docker users:

  1. Use the new wrapper script: `./scripts/docker-run.sh`
  2. Set environment variables for custom paths if needed
  3. See `README-docker.md` for comprehensive examples

🤖 Generated with Claude Code

dstengle and others added 2 commits July 24, 2025 01:47
## Changes Made

### 🐛 Dependency Issues Fixed
- Remove spacy dependency to eliminate srsly C compilation issues
- Move SPARQLWrapper from test to main dependencies
- Update poetry.lock to reflect dependency changes

### 🐳 Docker Runtime Improvements
- Add comprehensive environment variable support for flexible path configuration
- Support KBP_WORK_DIR, KBP_HOME, KBP_CONFIG_PATH, and other runtime variables
- Create multi-stage Dockerfile with proper C extension handling
- Add host networking support for local service connectivity

### 🛠️ New Docker Tooling
- Add docker-run.sh wrapper script for easy Docker usage
- Create docker-compose.app.yml for persistent services
- Add comprehensive Docker usage documentation in README-docker.md
- Support volume mounting with proper path resolution

### 📁 Path Configuration Enhancements
- Make .kbp directory location configurable via environment variables
- Support custom config file paths through KBP_CONFIG_PATH
- Improve config file detection with multiple fallback locations
- Enable flexible working directory configuration

### 🚀 Deployment Ready
- GitHub Actions workflow for multi-platform artifact building
- Support for Docker images, Python wheels, and standalone executables
- Cloud native buildpack configuration (project.toml)
- Comprehensive troubleshooting and usage examples

## Breaking Changes
- spacy dependency removed (EntityRecognizer disabled by default anyway)
- Config path resolution now prioritizes environment variables

## Benefits
- ✅ Eliminates srsly compilation issues in cloud builds
- ✅ Docker containers work with any directory structure
- ✅ Easy deployment across different environments
- ✅ Configurable paths for various runtime scenarios
- ✅ Network connectivity to host services (SPARQL endpoints)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@dstengle dstengle merged commit 4035c23 into main Jul 24, 2025
2 checks passed
@dstengle dstengle deleted the feature/kb-publish-unified-workflow branch July 24, 2025 02:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants